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ABSTRACT 

We describe a new method that can measure the true redshift distribution of any set of objects 
that are studied only photometrically. Measuring the angular cross-correlation between objects in the 
photometric sample with objects in some spectroscopic sample as a function of the spectroscopic z, 
along with other, standard correlation measurements, provides sufficient information to reconstruct 
the redshift distribution of the photometric sample. The spectroscopic sample need not resemble the 
photometric sample in galaxy properties, but must fall within its sky coverage. We test this hybrid, 
photometric-spectroscopic cross-correlation technique with Monte Carlo simulations based on realistic 
error estimates (including sample variance). RMS errors in recovering both the mean redshift and 
a of the redshift distribution for a single photometric redshift bin with true distribution given by a 
Gaussian are 1.4 x 10-3(cr^/0.1)(Sp/10)"°-3(dA^,j/dz/25, OOO)^^/^^ where is the true Gaussian cr, 
Ep is the surface density of the photometric sample in galaxies/arcmin^, and dNg/dz is the number of 
galaxies with a spectroscopic redshift per unit z. We test the impact of non-Gaussian redshift outliers 
and of systematic errors due to unaccounted-for bias evolution, errors in measuring autocorrelations, 
photometric zero point variations, or mistaken cosmological assumptions, and find that none will 
dominate measurement uncertainties in reasonable scenarios. The true redshift distributions of even 
arbitrarily faint photometric samples may be determined to the precision required by proposed dark 
energy experiments (A(z) ;$ 3 x 10"'^ at z ~ 1) with this method. 

Subject headings: galaxies: distances and redshifts, cosmology: large-scale structure of universe, 
methods: miscellaneous, surveys 



1. INTRODUCTION 

Almost all cosmological tests require information 
about the distance or redshift of the objects studied. 
For instance, the comoving length scale corresponding 
to baryon acoustic oscillations should remain fixed over 
time, but the corresponding angular size will depend 
on redshift in a cosmology-dependent manner; hence, if 
we measure this angular scale as a fui iction of redshift, 
we m ay infer cosmological parameters ()Seo &: EisensteinI 
I2OOI . Similarly, measuring weak lensing strength as a 
function of redshift can provide strong constraints on cos- 
mological models (Kaiser 1998), but the observed weak 
lensing signal will depend sensitively upon the redshift 
distribution of the background objects studied (|Hutereil 
12001 . 

However, although redshift (z) information is required 
for interpretation, it is infeasible to measure spectro- 
scopic redshifts for the samples of hundreds of millions of 
extremely faint galaxies to be studied by proposed pho- 
tometric dark energy probe s such as the Large Synop- 
tic Su rvey Telescope (LSST: lTvson fc Ange]|[200lHTvsonl 
2005f) or the Supernova / Acceleration Probe (SNAP; 



Deustua et all 120001 : [Pe"rlmutter fc SNAPII2004D . or even 
the millions of faint g alaxies in samples now underway 
(jHoekstra et al.l 120061 ) Hence, these projects will make 
use of photometric information to infer redshift distri- 
butions and to allow objects to be divided into multiple 



Electronic address: |j anewman@pitt . edu | 
1 Hubble Fellow 

^ present address: Department of Physics and Astronomy, Uni- 
versity of Pittsburgh, 3941 O'Hara St., Pittsburgh, PA 15260 



redshift bins for analysis. This is possible because galaxy 
spectra are generally not featureless; as a spectrum red- 
shifts through photometric passbands, its measured col- 
ors will vary with z in ways that may be predicted based 
on spectroscopic observations of galaxies at similar red- 
shifts or from the spectra of local analogues. 

However, these "photometric redshifts" in general lack 
the precision of spectroscopically-derived redshifts, both 
because of photometric noise and outliers (e.g. in cases 
of overlapping galaxies) and because in some classes of 
galaxies (e.g. those forming stars most rapidly) the ob- 
serv able spectral featu res are weak at most redshifts (see, 
e.g.. lllbert et al.l[2006f ). The true redshift distribution of 
objects with a given photometric redshift value may or 
may not be strongly or even singly-peaked, depending 
on the galaxy type, passbands used, photometric errors, 
etc. 

Because of these difficulties, dark energy experiments 
are unlikely to ever treat individual photometric red- 
shifts as known with precision; instead, the approach 
taken in forecasts is to assume that objects will be 
divided into photometric redshift bins (|Albrecht et al.l 
I2OO6I ). However, in order to obtain precision measure- 
ments of the properties of dark energy, both weak lens- 
ing and photometric baryonic acoustic oscillation (BAO) 
experiments require that the true redshift distribution 
of the objects in each bin be known with very high 
accuracy. Projections for SNAP are that any overall 
bias in the mean redshift of a bin must be smaller than 
2 — 4xlO~'^inz fo r dark energy constraints not to be 
degraded strongly (Huterer et al. 2004; Ma et al. 200^ 
iHuterer et"aLll2006[ ): for LSST, it is estimated that the 
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mean redshi ft in each bin must be known to ~ 2 x 
10-3(1 + z) (iZhan fc Kno:!d [200fit IZhMllMffit iKnox et all 
120061; lTisoii2006|,Tyson,'Connolly, & Newman, in prep.). 
The true width of each bin must also be known, though 
with less precision (Acr^ <~ 3 x 10-^(1 + z) for LSST, 
where ct^ is the Gaussian sigma of the true redshift dis- 
tribution; Tyson, ConnoUy, & Newman, in prep.). 

These targets wiU be difficuh to meet with standard 
spectroscopic techniques. Both ongoing and proposed 
experiments reach depths far too faint (up to Rab ~ 30 
in the case of SNAP) for existing telescopes and spectro- 
graphs to measure redshifts. Recent and ongoing surveys 
of faint galaxies using the largest telescopes available 
have obtained spectra for tens of thousands of galax- 
ies to R ^ 24 or J ~ 23 (Davis et al. 2006, in prep; 
iLe Fevre et al.l l2005l ) and for a few thou s and selecte d 
galaxies to i? ~ 25.5 (|Steidel et al.l fl999L [2003L [200l . 
but precise measurements of the redshift distributions of 
samples of galaxies with i? ~ 26 — 27 or even fainter will 
be required in the future if proposed surveys are to reach 
their targets. 

Efforts to obtain true redshift distributions spectro- 
scopically are made more difhcult by the fact that faint 
galaxy surveys fail to obtain redshifts for a substan- 
tial fraction of their targets. The DEEP2 Galaxy Red- 
shift Survey, for instance, has obta ined secure redshift s 
for ~ 70% of the galaxies studied (jCooper et al.l 120061 ). 
Roughly half of the missed targets appear to be star- 
forming galaxies at z > 1.4 (which have no features 
within the DEEP2 spectral window) based on follow-up 
observations of blue DEEP2 redshift failures (G. Stei- 
del, priv. comm.), but the redshift distribution of the 
remainder is unknown and currently being tested. 

Surveys of fainter galaxies have even lower success 
rates. Despite integration times of more than thirty 
hours per object, the Gemini Deep Deep Survey (GDDS) 
only succeeded in measuring spectroscopic redshifts for 
~ 15% of their targets with 24 < I < 24.5, and < 50% of 
objects with 23 < / < 24 () Abraham et al.ll2004D . How- 
ever, even with a completeness as high as the Sloan Dig- 
ital Sky Survey (- 99%; Schlegel et al. 2007, in prep.), 
if the objects missed are not a random subsample, red- 
shift distributions could be biased beyond the tolerances 
of future dark energy surveys. 

Despite these difficulties, it is essential that there be 
some external method for testing photometric redshifts 
of faint galaxies, as it is quite likely that the spectral 
energy distributions (SEDs) of bright galaxies (whose 
redshifts are more easily obtained) should differ from 
those for fainter objects. Both locally and at z ~ 1, the 
bluest galaxies (in rest-frame color) are intrinsically faint; 
they have no luminous analogues. These issues make us- 
ing SEDs from bright galaxies to determine photomet- 
ric redshifts for fainter galaxies problematic. Further- 
more, at fainter magnitudes, higher-redshift galaxies will 
be more and more prevalent within a given photomet- 
ric redshift bin. These high-redshift galaxies may have 
contributions to their SEDs from metal-deficient "Popu- 
lation 111" stars that appe ar to have no local analogues 
pimenez fc HaimanI |2006|) . As an additional complica- 
tion. Population III contributions should be greater in 
fainter, lower-mass galaxies than in more massive galax- 
ies at all redshifts, given the evidence that lower-mass 
objects generally start forming stars later (e.g., Noeske 



et al., submitted). 

Despite these difficulties, if photometric dark energy 
surveys are to reach their goals, it is vital that we have 
some method of calibrating photom etric redshifts with 
high precision (Al brecht etall 1200 61). In this paper, we 
describe a new method that can determine the true red- 
shift distribution for any class of object (e.g. objects 
in a particular photometric redshift bin) by exploiting 
the fact that all galaxies at a given redshift cluster with 
each other. We presume the existence of a large sam- 
ple (or samples) of objects with spectroscopic redshifts. 
The observed angular clustering between any two sam- 
ples of galaxies will depend on both the intrinsic cluster- 
ing of objects in the two samples with each other, and 
the degree to which they overlap in redshift (since clus- 
tering over extremely large distances is minimal) . Hence, 
by measuring the apparent angular cross-correlation be- 
tween the positions of the photometric objects and the 
spectroscopic sample as a function of the spectroscopic 
z, we may determine the actual redshift distribution of 
objects in the unknown class; the information provided 
by autocorrelation measurements for each sample allows 
us to break the degeneracy between correlation strength 
and redshift distribution. 

Similar cross-correlation techniques have been used 
in the past to measure correlation functions (Phillipgi 
[1985; Masicdi et al. 2006) and luminosity functions 
(jPhillipps fc Shanks .1987, ) at separations or depths 
where redshift surveys are incomplete; here, we explore 
their use to measure redshift distributions. The angu- 
lar cross-correlation can be measured with good preci- 
sion in uniform, well-calibrated photometry, as required 
for future dark energy probes, as there will be many 
photometric galaxies near each spectroscopic galaxy on 
the sky. The use of angular cross-correlations between 
photometric redshift bins to constrain the presence of 
redshift outliers has also been ex plored (jSchneider et al.l 
l2006t [Padmanabhan et al.ll2006f) . but cannot determine 
redshift distributions in detail. 

The principal requirement of this method is that red- 
shift survey data be available overlapping the photomet- 
ric sample; however, the objects with redshifts need not 
be similar to the target class (e.g., only high-confidence 
redshifts of relatively bright galaxies could be used when 
determining the redshift distribution of a sample of very 
faint galaxies). In §2, we provide the theoretical under- 
pinnings of this method. In §3 we present Monte Carlo 
tests of its effectiveness for scenarios appropriate for cur- 
rent and future redshift surveys. We evaluate potential 
sources of systematic error in §4, and in §5 we conclude. 
Throughout this paper, we will use comoving coordi- 
nates for all distances and assume a cosmology with zero 
spatial curvature. Where a specific cosmology must be 
adopted, we assume a flat ACDAI cosmology with mat- 
ter density Qm = 0.3, dark energy density Qa — 0.7, and 
Hubble parameter Hg = IQOh km s"^ Mpc^^. 

2. MEASURING REDSHIFT DISTRIBUTIONS VIA 
CROSS-GORRELATIONS 

2.1. Basic Techniques 

Consider two sets of objects at cosmological distances, 
one with secure redshift measurements, which we will la- 
bel 's' (for 'spectroscopic', though exceedingly accurate 
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photometric redshifts might be used), and the other with 
unknown redshifts, which we will label 'p' (for 'photo- 
metric'). In the most likely applications, the photometric 
sample would be a subset of objects in some photometric 
dataset - e.g. objects in some bin of photometric redshift 
- and the spectroscopic sample would result from one or 
more redshift surveys within the region of sky covered by 
p. Although cross-correlation analyses require significant 
sample sizes, it should be possible to determine the bias 
and uncertainty in photometric redshifts as a function of 
z by studying samples in a set of photometric-redshift 
bins. 

The mean comoving number density of objects in the 
photometric sample (p) a comoving real-space distance r 
from an object in the spectroscopic sample (s) at redshift 
z, np{r, z) will be 

{np{r,z)) =np{z){l + £,sp{r,z)), (1) 

where np{z) is the comoving number density of objects in 
sample p at redshift z and ^sp(r, z) is the two-point cross- 
correlation function between samples s and p. Hence, 
£,sp defines the excess probability of finding an object of 
class p separated by a distance r from an object of class 
s that is at redshift z, above the probability if the two 
populations do not cluster with each other. Finally, we 
denote the probability distribution function for the true 
redshift of an object in the photometric sample by 4>p{z). 

We assume that the surveyed objects are distant from 
us compared to the length over which correlations are 
significant and that ^sp(r, z), np(z), and 4'p{z) may all 
be treated as constant over separations in the redshift 
direction comparable to that length (assumptions that 
all hold in typical high-redshift samples). In the distant- 
observer approximation, we may define = + rp = 
irf + dA{z)'^d'^, where tti is the comoving separation be- 
tween two objects along the line-of-sight direction, rp is 
their projected comoving separation in the plane of the 
sky, (i^(z) is the angular size distance to redshift z, and 
6 is their angular separation in radians. We also define 
l{z) to be the comoving distance to redshift z given by 
l{z) = c/H{z)dz, where c is the speed of light and 
H{z) the Hubble expansion parameter at redshift z. In 
calculations for measurements of angular correlations, we 
may ignore redshift-space distortions, so for a photomet- 
ric object at redshift z' separated by tt; along the line-of- 
sight direction from a spectroscopic object at redshift z, 

l{z')=liz)+TTl. 

The fundamental quantity we wish to recover is 4>p{z), 
the probability distribution for the true redshift of an 
object in p. It can be related to Up and the mean surface 
density of objects in p on the sky (in units of objects per 
steradian), denoted here by E^: 



dNp 



dN, 



^dz' 



dz'dn 



, . dV 



np(z) ^dl 



(2) 



tometric sample per unit redshift per steradian and 
dV/ {dz cZri) is the amount of comoving volume per unit 
redshift per steradian (equal to dA^z)^ dl/dz). 

Equation 1 gives the excess number density of photo- 
metric objects near a spectroscopic object as a function 
of their real-space separation and the spectroscopic ob- 
ject's redshift. However, what we are actually able to 
measure is the excess number of objects per unit area on 
the sky. We therefore multiply Equation 1 by dV/dz dfl 
and integrate over the possible redshifts of a photometric 
object, z' , to obtain (£(0,2;)), the mean surface density 
of objects in p an angle 6 from an object in s at redshift 
z. We then obtain: 

dl 



{E{0,z)) = I np{z')dA{z'Y—dz' 



OQ 



+ 1 S,sp{r,z)np{z')dA{z'Y -^dz' 







Sp0p(z')dz'+ / S,sp{r,z)Y^p(l)p{z')dz' 



Y.p{l + Wsp{9,z)) , 



(3) 



where Wsp^O^z) defines the angular cross-correlation 
function between the spectroscopic and photometric 
samples and Sp is the mean surface density of objects 
in p over the sky. This constitutes the principal observ- 
able we will use to reconstruct the redshift distribution 
of the photometric sample (sample p) . 

For convenience, we assume that all correlation func- 
tions may be described by power laws with linear bi- 
asing. This assumption is somewhat unrealistic - in 
real applications, precision measurements should use a 
halo model (cf. Cooray & Sheth 2002 and references 
therein) or other more sophisticated methods - but is 
sufficiently accurate to predict uncertainties for cross- 
correlation methods. If ^sp is represented by the power 
law form ^sp(r) = (r/ro.sp)""*", then the integrals in Equa- 
tion [3]d may be evaluated analytically to obtain: 



Mz)Hh)rl^pe'-'>dA {z) 
dl/ dz 



1-7 



(4) 



where i?(7) = r(I/2)r((7-l)/2)/r(7/2) (lPeeblesll98nt) . 
and we have treated (j)p{z) and dA{z) as constant over 
the range in z' for which S^sp is nonnegligible (i.e., where 
l(z) — l{z') is not much greater than ro,sp)- 

Although future applications of this method may make 
use of angular information (e.g. to constrain biasing 
models), when determining the uncertainties resulting 
from application of cross-correlation methods below, we 
will focus on the integral of Wsp within an angle equiva- 
lent to some comoving distance rmax (which we will leave 
fixed with z) for simplicity. We label this integrated Wsp 
w(z), and define the angle corresponding to r^ax at a 
given z to be 9max{z)- Integrating Equation 3] over 9, we 
may relate (pp to w by the equation: 



(j)p{z) = w(z) 



3-7 dAizfdl/dz 



27r 



i/(7)r, 



0,sp 



3-7 



(5) 



where dNp/ {dz dil) is the number of objects in the pho- 



In general, a useful choice of r^ax should be large 
enough that the effects of nonlinear biasing (which can 
complicate modeling of correlation functions) are small, 
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but small compared to the angular size of the photomet- 
ric sample to minimize edge effects. Our fiducial scenario 
will use Tmax = 10 Mpc, corresponding to roughly a 
quarter of a degree at z = 1. Correlation functions at 
both z ~ and z ^ 1 are closely approximated by powe r 
laws at this scale (jZehavi et al.ll2005l : ICoTl et alj(2006ct ). 

Inspection of Equations [H and [5] shows that to deter- 
mine 4>p{z) from Wsp or w, we must know the basic cos- 
mology (sufficient to determine (i^(z) and dl/dz modulo 
factors of h), ro,sp, and 7. We test the degree to which the 
cosmology must be known in i j4.41 realistic uncertainities 
in cosmological parameters prove to have negligible im- 
pact. 

The same observations used to measure Wgp provide 
sufficient information to determine rp^sp and 7 via the 
autocorrelation functions of the photometric and spec- 
troscopic samples, and £,ss- This is true because, un- 
der our assumption of linear biasing, the cross-correlation 
^sp(r) must be given by the geometric mean of the au- 
tocorrelations of the two samples, £,sp = {(.ss x S,ppY^^', 
we will refer to this as the "simple biasing" assumption 
hereafter. This equation holds to high accuracy for the 
measured cross-correlations between subsamples in mod- 
ern reds hift surveys, eve n when their clustering differs 
strongly (jCoil et al.ll2007( ). The autocorrelation function 
of the spectroscopic sample, ^ss, is measurable directly 
from the spectroscopic sample, and is in fact a prime 
observable of redshift surveys; it thus remains only to 
determine ^pp. 

We may use the observed angular autocorrelation of 
the photometric sample, 'Wpp{9), in conjunction with an 
initial guess for (t>p{z) to obtain the mean parameters of 
^pp, since they are related by Limber's equation (eval- 
uated for a power law correlation function, with scale 
length ro,p a function of z but exponent 7p constant): 



wpp{e)^H{-fp)e 



l-7p 



(6) 



(|Peebleslll980t ). Note that 7p can be measured directly 
from the shape of Wpp{9), so given a form for (j)p and the 
cosmology, the mean value of ro,p may be determined 
directly from its amplitude (in general, 7 varies only 
modestly - < 10% - even a mongst samples of galax- 
ies with very different biasing : IZehavi et al.|l2005 i). This 
procedure may be iterated by using the derived param- 
eters of <^pp to determine ^sp and hence (pp from cross- 
correlations, then redetermining ^pp using this (/)p, and 
then refining (j)p from cross-correlations, etc. until con- 
vergence is reached. Although Wpp yields only a weighted 
mean of the value of tq p{z), the consequences of feasi- 
ble amounts of variation in the bias of the photometric 
sample with redshift are modest; we demonstrate this in 

To summarize: the most basic large-scale structure 
measurements possible where a photometric sample over- 
laps a spectroscopic one - the two-point autocorrelation 
functions of each sample with itself, and their cross- 
correlation on the sky, measured as a function of spectro- 
scopic redshift - provide sufficient information to recon- 
struct the redshift distribution of the photometric sam- 
ple. In the remainder of this paper, we will attempt to 
determine the uncertainties, both random and system- 



atic, that should result from applying such methods to 
realistic samples. 

2.2. Error Estimates 

We begin by estimating the error in a measurement 
of (t>p{z) in some small bin of redshift of centered at z 
and of width Az. We presume here that the errors in 
(f)p will be dominated by the uncertainty due to count- 
ing statistics in a measurement of w, the integral of Wgp 
within the angle 9maxiz). Poisson uncertainties should 
dominate when w is small (the "weak-clustering" limit), 
which should always be the case unless 4>p is unrealisti- 
cally narrow ( iPeebleslllQSOf) . We defer investigation of 
possible systematic errors to 311 

Modulo the modest impact of sample variance (see 
i j3.4[) , we expect uncertainties to be dominated by errors 
in w, as the autocorrelations S.ss and Wpp should be mea- 
sured more precisely. Then, applying standard propaga- 
tion of errors to Equation[5l a{(f)p{z))/(l)p{z) — (t(w)/w. 
Furthermore, cr(w)/w must equal the uncertainty in the 
total excess (over random) number of neighbors in p sur- 
rounding any member of s due to clustering (which we 
will label a{Nc)), divided by the expected number of 
these neighbors (denoted by Nc), as w is directly pro- 
portional to Nc by definition. Since these quantities are 
simple to predict, we will determine a{(f>p{z))/(l)p{z) by 
calculating the equivalent quantity, a{Nc)/Nc. 

In the weak-clustering limit (i.e., so long as w is small, 
as is true here), the uncertainty in Nc, a{Nc), is given 
simply by the Poisson uncertainty in the expected total 
number of spectroscopic -photometric pairs if there is no 
clustering ()Peebleslll98"0l V Thus, 



cr^{Nc 



dN, 
dz 



-Az 



(7) 



where dNg/dz gives the actual redshift distribution of 
the spectroscopic sample. Inside the first parentheses in 
Equation [7] is found the expected number of members of 
the photometric sample within 9max of each object in the 
spectroscopic sample, while inside the second parentheses 
we give the number of objects in s within the designated 
redshift bin. 

We may determine Nc, the total number of excess 
spectroscopic-photometric {s—p) pairs within separation 
Qmax over random due to correlations, by integrating the 
real-space two-point cross-correlation function over the 
relevant volume: 



Nc=[ I 2Trrp J np{z') £,sp{r, z) dz'drp^ x ^^^^ A 



;dNs . 
X I — — Az 

dz 



2ttH{j) 



0p(z) Ep ^ dNs 



dA{zY dl/dz °''P dz 



Az 



2^g(7) 0p(z)I]p .-7^?^A. 
3-7 dA{zY dl/dz dz ' 



r-p drp 
(8) 
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where again we have separated the number of members of 
the photometric sample around each member of the spec- 
troscopic sample from the total number of members of 
the spectroscopic sample within Az using parentheses be- 
fore combining them, and assumed that (j)p{z) is approx- 
imately constant over the range in l{z') where ^spi^, z) is 
nonngeligible. 

Combining Equations [7] and [51 we then find: 



N, 



(9) 



^ 3~-7 (j. \ ^'^ dA{z)dl/dz 

20Fif(7) \^ dz 'J rlr^-al 

It is worth noting that these errors scale only very 
slowly with Tmax, since for typical galaxy samples 7 ~ 
1.6 — 1.9. As a consequence, it is possible to minimize 
nonlinear effects not only by measuring correlations to 
large separations (i.e., increasing rmax)^ but also by ex- 
cluding the smallest separations (rp < 1 — 2h~^ Mpc) 
from the calculation of the integrated correlation, {^{z). 
Predicted measurement uncertainties increase only mod- 
estly if small-separation pairs are not considered; for 
fmax = 10/i~^ Mpc, excluding the central 2ft,~^ Mpc in- 
creases overall errors by roughly 15%. That exclusion 
radius is larger than the maximum Tp where non-power 
law cross-correlations have been observed in studies of 
the clus tering of blue, sta rforming galaxies about galaxy 
groups (jCoiretanilQQli), which is likely to be a maxi- 
mally pathological case. 

2.3. Sample Variance 

These error estimates have ignored the fact that the 
mean density of the regions where we perform these 
cross-correlation measurements at a given z may be 
higher or lower than the Universal mean, the effect com- 
monly referred to as "sample" or "cosmic" variance. 
Thus, the recovered redshift distribution of the photo- 
metric sample, 4)r{z), will differ from the distribution 
that would be obtained if an infinite volume were sur- 
veyed, If the region over which cross-correlations 
are measured corresponds to the full area covered by the 
photometric sample, then in fact (pr may be the desired 
quantity, rather than the "true" , underlying distribution, 
(pp. Outside of this regime, we must consider the impact 
that sample variance will have on our recovery of 4>p{z). 

In particular, we can place two limits on the impact 
of sample variance. If cross-correlations are measured 
over very large areas of sky (hundreds of square degrees), 
sample variance should be negligible compared to other 
sources of error, and the random errors in the recovery 
of 4>p{z) will simply be given by Equation (5] We will 
assess how much area is sufficient to reach this regime 
in ^ 33.41 Thus, our previous error estimates are in fact 
a lower limit on measurement uncertainties from cross- 
correlation techniques. 

If only a few fields with small areas are surveyed, the 
impact of sample variance is much greater. However, the 
spectroscopic sample may be used to limit this impact, 
as the variations in density will cause proportional varia- 
tions in the number of galaxies found in a given redshift 
bin (compared to a smooth model). Since Wspiz) mea- 
sures the excess number of companions per spectroscopic 



object at a given z, these variations in the redshift dis- 
tribution of the spectroscopic sample, s, do not affect 
the measured 4>p{z) directly. However, there will be cor- 
responding variations in the number of members of the 
photometric sample p at that redshift, with the ampli- 
tude of those variations proportional to the ratio of the 
large-scale bias of sample p to that of sample s. 

Since that ratio of biases would be determined in the 
process of measuring 4'p{z) from cross-correlations, we 
may estimate the universal value of 4>p{z) from the value 
reconstructed in a particular region of the sky: 

1 + hp/hs As(z) 

where hp and bs are the linear, large scale biases of 
the photometric and spectroscopic samples {p and s, 
respectively) and l^s{z) is the fractional deviation of 
dNs/dz at a given redshift from a smooth model; i.e., 
[{dNs I dz) observed - {dN s / dz) true\ I (dNg / dztrue)- Becausc 
{dNs/dz) observed IS determined from the finite number of 
spectroscopic objects within Az, it will be subject to 
Poisson variance; hence As(z) has a measurement uncer- 
tainty cr(As) — {dNs/dz X Az)~^/^. This will propagate 
into a residual uncertainty in (/'p(z) of {bp/bs){dNs/dz x 
Az)~i/2^^(^-) (taking (1 + {bp/bs)/^sY ~ 1, which holds 
for all reahstic survey characteristics). This error is inde- 
pendent of counting-statistics errors; thus when assess- 
ing the maximal impact of sample variance, we combine 
it with the measurement uncertainty given by given by 
Equation [9] following standard propagation of errors. 

Because it increases overall uncertainties the most 
where 4>p{z) is largest, the net effect of sample variance 
after correcting with {dNs/dz) observed is to reduce mod- 
estly the advantages of samples with tight redshift dis- 
tributions or high surface densities. Since errors from 
both counting statistics and sample variance scale as 
{dNs/dz X Az)~^/^, though, both of these sources of un- 
certainty will be reduced by the same fraction if dNs/dz 
is increased. In our Monte Carlo simulations, we as- 
sume hp = hs- We expect that for typical datasets 
bp < bs, as photometric samples should go fainter than 
spectrosco pic samples, and fainter objects tend to have 
lower bias (jZehavi et al.ll2005HCoil et al.ll2006cf ). making 
this assumption an upper limit. In the simulations be- 
low, we will estimate the errors in the reconstruction of 
<j)p{z) both when sample variance is negligible and when 
dNs/dz is used for corrections, in order to bracket the 
possibilities. 

3. MONTE CARLO TESTS 

3.1. Basic Scenarios 

We now investigate the degree to which cross- 
correlation techniques can recover true redshift distribu- 
tions for photometric samples. For our most basic sce- 
nario, we adopt a simple (t>p{z) distribution given by a 
Gaussian with mean redshift zq (which we generally take 
to be 1, near the peak of sensitivity of most dark energy 
measurement methods) and standard deviation az', i.e.. 



(j)p{z) ^ g{z) = 



1 



2iTa 



■ exp (- 



{z-zpf 

2(7? 



(11) 



We then use Monte Carlo techniques to test the recovery 
of both the mean and standard deviation of this distri- 
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Fig. 1. — Redshift distributions assumed for current spectro- 
scopic samples (blue dashed line) and future samples (red solid 
line). The assumed characteristics of each sample are given in Ta- 
ble |T] The differences are the addition of an intermediate- redshift 
survey, PRIMUS (Eisenstein et al. 2007, in prep.); a baryonic oscil- 
lation survey, Wis glcZ (Gl azebrook et al . 2007, in prep.); zCOS- 
MOS IjLillv fc Th e Zcosmo s Team. I [20051 '): and larger samples at 
2 > 2 in the near-future scenario. These samples were used to pro- 
duce the Monte Carlo realizations shown in Figure |2] The black, 
dot-dashed line indicates the assumption used for our standard 
scaling scenario, which approximates current redshift samples at 
2 ~ 1. 



bution given a spectroscopic sample with some redshift 
distribution dNg/dz. A test of these techniques with cat- 
alogs taken from an N-body simulation is now underway, 
and finding similar results (Wittman 2008, in prep.). 

To perform these Monte Carlo tests, we generate real- 
izations of the recovered ^p(z) in a large number of bins 
of width Az, adding to the true ippiz) in each bin an 
error drawn randomly from a Gaussian distribution with 
mean zero and standard deviation given by a{(l)p{z)) for 
that bin, incorporating both counting statistics and sam- 
ple variance as described above. Where > 0.1, we use 
bins of width Az = 0.01; otherwise, the bin width used is 
Az = 0.01 X (cr^/O.l) to ensure the peak is well-resolved. 
For each realization, we fit for the parameters of (f>p with 
standard nonlinear least-squares techniques. To ensure 
stability in the fitting, we provide initial guesses for the 
parameters given by the true value plus a random value 
drawn from a Gaussian distribution with standard devia- 
tion 10% of the true value. This is effectively equivalent 
to assuming that the true distribution parameters are 
known to 10%, far worse than the tolerances for most 
dark energy experiments and much larger than the er- 
rors resulting from the cross-correlation measurements. 
We show example realizations and reconstructions (based 
on the redshift distributions shown in Figure [T]) in Figure 

m 

For every scenario tested in this paper, we generate ten 
thousand realizations of this sort, measuring the param- 
eters of (j)p{z) each time; we then determine the mean 
and standard deviation of the results for each parameter 
to test the efficacy of cross-correlation methods. For our 
basic scenarios, we vary three things: the width of (f)p{z), 
az] the surface density of members of the photometric 
sample on the sky, Ep; and the redshift distribution of 
the spectroscopic sample, dNs/dz. 

We here ignore the weak cross-correlations induced 



by gravitational lensing. These correlations can be pre- 
dicted directly from t he observed galaxy number counts 
(jScranton et al.|[2005f ). Alternately, it should be possible 
to iteratively remove the lensing-induced signal once we 
have an estimate oi (j)p{z) as that will allow us to predict 
how much cross-correlation with members of s at a given 
z should result from lensing. 

The results of these simulations are shown in Figures 
[21 m and[5l For samples with constant dNs/dz, if sam- 
ple variance is negligible, the Monte Carlo simulations 
find that the errors in determining either (z) or Uz are 
identical, and can be fit extremely well (to within 1%) 
by: 



cr = 9.1x10" 



1.5 



Vo.i 

Ah^'^ Mpc 



-1/2 



(dNjdz 



-1/2 



V 25,000 
lO/j-iMpc^ 



(12) 



where Sp is expressed in galaxies per square arcminute. 
Typical values o f 7 for both local and z ^ 1 galaxy sa m- 
ples are 1.7-1.8 (jZ ehavi et al. 2005; Coil et al.ll2006cD . 

The scaling of uncertainties with cr^ may be understood 
as the combination of two effects. First, if the x coordi- 
nate of a distribution is rescaled by some factor, errors 
in quantities proportional to x should be rescaled by the 
same factor, so it is not surprising that a oc (ctz/0.1) at 
least. However, there is an additional factor: when Uz 
is smaller, (j)p{z) is niore concentrated about the mean 
value, so fractional errors in <f>p from Poisson statistics 
are smaller about the peak, leading to the additional fac- 
tor of (crj0.1)°-5. 

If sample variance is corrected for using the observed 
fiuctuations in dNs/dz, the uncertainty in determining 
(z) is fit fairly well (to within 20%) by: 



cr = 1.4x10" 



(^] (hi 
\ 10 

4/i~i Mpc 



dNs/dz 
25,000 

lO/i-^Mpc 



-1/2 



2-7 



(13) 



while the uncertainty in Uz proves to be 10% smaller. 
The dependence of errors upon Uz and Sp is significantly 
weaker in this scenario. When Uz is smaller, the true 
redshift distribution covers a smaller range in z, making 
the impact of sample variance larger; while when Sp in- 
creases, Poisson errors decrease but sample variance does 
not, reducing its effects. 

For comparison, the estimated requirements for am- 
bitious future surveys such as SNAP or LSST are 
cr((z)) < 2-4x10-3 at z ~ 1 (see §1); throughout the 
remainder of this paper, we will take 3 x 10"'^ as a reason- 
able target for these projects, and indicate it by a dashed 
line in the relevant figures. Even with one-tenth the sur- 
face density assumed in our standard scenario, estimated 
errors are within this limit. Cross-correlation techniques 
can meet the calibration requirements of next-generation 
dark energy surveys. 

3.1.1. Combining Survey Samples 

We now consider more realistic scenarios, where the 
spectroscopic sample is a combination of real or planned 
redshift surveys. In general, different redshift surveys 
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Fig. 2. — Examples of individual Monte Carlo realizations for the recovery of <j>p{z) using the combinations of current spectroscopic 
datasets (left) or of current and future datasets (right) shown in Fig. [T] Each realization was generated by randomly drawing from realistic 
error distributions for the recovery of <f>p{z) in bins of width Az = 0.01. Plotted in blue is the true, input redshift distribution, given by 
Equation 1111 with = 0.1. The black histogra m sh ows one realization for the distribution measured using cross-correlation techniques, 
with realistic errors determined as described in i|2.2l Shown in red is the distribution determined from a least-squares fit to the simulated 
data shown by the black histogram. The recovery is good enough in each case that the blue curve is essentially invisible. 



TABLE 1 
Assumed Redshift Survey Samples 



Survey name 



# of high-confidencez's zq^ oi' redshift range 



Reference 



Current Samples 



Sloan Digital Sky Survey (SDSS) 
AGN & Galaxy Evolution Survey (AGES) 
DEEP2 Galaxy Redshift Survey, EGS 
DEEP2 Galaxy Redshift Survey, non-EGS 
VIMOS/VLT Deep Survey 
Lyman/Balmer break samples, 1.5 < z < 4 


800,000 
10,000 
8,200 
17,000*' 
10,000= 
2,500 


0.017 
0.09 
0.225 
0.225*= 
0.27 
1.5 < z < 4 


Strauss et al. 2002 
Kochanck et al. 2004 
Davis et al. 2006 " 
Faber et al. 2007 
Le Fevre et al. 2005 
Steidel et al. 1999, 2003. 2004 


Near-future Samples 


WiggleZ 
PRIMUS 

zCOSMOS, / < 22.5 
zCOSMOS, high-z 

Lyman/Balmer-break samples, 1.5 < z < 4 


350,000 
300,000 
5,000= 
2,500= 
5,000 


0.25 < z < 1<* 

0.23<= 

0.23 
1.5 < z < 2.5 

1.5 < z < 4 


Glazebrook et al. 2007, in prep. 

Eisenstein et al. 2007, in prep. 
Lilly et al. 2006 
Lilly et al. 2006 

N/A 


^ Except where redshift ranges are specified, we 


assume redshift distributions arc 


proportional to 


^Sg-s/sQ^ and so have median redshift 



given by 2.6720 s.nd mean redshift 3-So- We then use pubhshed median redshifts or photometric depths to estimate 20, as described in 
the text.^ Outside of the Extended Groth Strip (EGS), DEEP2 uses a color cut complete for z > 0.75 (~ 50% complete at 2 — 0.7). 
[Oil] 3727 and the 4000-Abreak leave the DEIMOS spectral window at z ^ 1.4, so redshift success is minimal beyond that point. We 
therefore include only redshift quality— 4 objects with 0.7 < 2: < 1.4 in this count and the model redshift distribution. For VVDS and 
zCOSMOS, we take a 20% rate of flag— 4 redshifts, foUowing lLe Fevre et a l. (2005), and take a full VVDS sample size of 50000 objects. 
We optimistically assume that the median redshifts given in iLe Fevre et al.l ((2005tl apply for the flag— 4 galaxies, though very few of 
them are at 2; > 1 (Il bert et al.|[2005D ."^ The redshift distribution from test WiggleZ observations may be roughly approximated by a 
Gaussian of mean 0.55 and a 0.25, truncated at 2 — 0.25 and 2 — 1.*^ We presume that, in addition to being /-band limited, PRIMUS 
will generally not measure redshifts for objects with z > 0.95 (A. Coil, priv. comm.). 
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Fig. 3. — Errors in the recovery of the mean redshift, (z) (red) 
or the RMS dispersion of redshifts, tr^ (blue) for objects in a pho- 
tometric sample versus their surface density in galaxies per square 
arcminute, Sp, as measured in our Monte Carlo simulations. Note 
that for a single photometric redshift bin drawn from a larger sam- 
ple, Sp is the surface density only for objects in that bin, not 
for the overall sample. The black, dashed line indicates the esti- 
mated maximum error in {z) allowable for proposed dark energy 
surveys using the SNAP satellite or LSST. We assume a spectro- 
scopic sample with dNs/dz = 25,000 (roughly corresponding to 
current samples at z ~ 1) and a true <j>p{z) having cr^ = 0.1. If 

sample variance is negligible, both errors scale as cr oc Sp if 
it has maximal impact, their scaling is weaker, cr oc Sp If the 
photometric sample has very low surface density, larger numbers 
of redshifts or a narrower redshift distribution than assumed in 
our standard scenario may be required to meet the requirements 
of future dark energy surveys. 

will be optimized for different depths or redshift ranges; 
hence, to cover the full possible redshift range of the 
photometric objects, a combination of different redshift 
surveys is likely to be used. This does not present any 
fundamental problems; £_sp and ^ss uiay be determined 
and (/>p(2;) estimated separately for each dataset; the re- 
sulting (f>p{z) from each sample may be combined with 
weighted means. 

If random errors on each sample's clustering measure- 
ments are small ( ^ 1%), as is generally true in large 
modern surveys, the net errors on (ppiz) should be the 
same when we simply use the aggregate dNs/dz as if 
we measure separately and combine (we test the impact 
of clustering measurement uncertainties in i 34.2p . We 
hence estimate the combined dNg/dz from current sam- 
ples and from surveys that have recently begun observa- 
tions. The samples considered, their estimated median 
redshifts, and each sample's total number of galaxies are 
given in Table [TJ Where other information is not avail- 
able, we have used announced or p ublished magnitude 
limits and the fitting formulae from ICoil et al] ()2004bl ) 
to estimate median redshifts. We assume that except for 
hard redshift limits set by color cuts or lack of features in 
spectral windows, all z < 2 samples have redshift distri- 
butio ns of the form z^e ~^^^° , which fits current datasets 
well (|Coil et al.l l2004bl ): for distributions of this form, 
zq = median(z)/2.67. For z > 2, we assume that dNg/dz 
will be fiat, the rough consequence of applying a wide 
variety of high-redshift selections targeted at different z 
ranges. The estimated combined redshift distributions 
for current and near-future samples used here are shown 



Fig. 4. — Errors in the recovery of (z) (red) or ctz (blue) ver- 
sus the true value of a^, from our Monte Carlo tests. The black, 
dashed line indicates the estimated maximum error in (z) allowable 
for proposed dark energy surveys using LSST or the SNAP satel- 
lite. We assume here a spectroscopic sample with dNs/dz = 25, 000 
(roughly corresponding to current samples at z ~ 1) and a pho- 
tometric sample with a surface density of 10 galaxies per square 
arcminute. If sample variance is negligible, both errors scale as 

(T^^^; if it has maximal impact, their cr^-dependence is weaker, 
a oc az- In all plotted cases, the errors in measuring the parame- 
ters of the redshift distribution are much smaller than required for 
future dark energy surveys. 



in Figure [H 

For the sample of current surveys (all but zCOSMOS, 
WiggleZ, and PRIMUS from Table [I| we then find: 

while for a reasonable near-future scenario (including 
only projects that have begun observations), we find: 

corresponding well to the tolerances for future dark en- 
ergy experiments (cr((z)) ^ 3 x 10~^, as described in 
§1). We have assumed that sample variance is negligible 
for the near-future scenario, as the largest-area projects 
(SDSS and WiggleZ) plan to survey at least 1000 square 
degrees each (see 13.41) . 

3.2. Non-parametric reconstruction 

Although these methods are highly successful at 
reconstructing the parameters of 4'p{z) given the correct 
general model, we can also test how well we may recover 
(z) when making no assumptions about (t>p{z) at all. 
We hence calculate the recovered mean redshift for each 
Monte Carlo realization, (z) — Zi(j)i/ J^'Pij where Zi 
is the redshift of the zth bin, (f>i is the recovered 4>p{z) 
in that bin, and we use ^ to indicate summation over 
all bins i. We then find that, for our standard scenario 
and averaging over the redshift range < z < 2, the 
standard deviation of these mean redshift estimates is: 
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Fig. 5. — Errors in the recovery of (2) (red) and (blue) versus 
the number of spectroscopic galaxies per unit redshift, dNs/dz, 
from our Monte Carlo tests. The black, dashed line indicates 
the estimated maximum error allowable for proposed dark energy 
surveys using the SNAP satellite or LSST. We assume here that 
the photometric sample has a surface density of 10 galaxies per 
square arcminute and a true (f>p{z) having = 0.1. Regardless 
of assumptions about sample variance, all uncertainties scale as 
{dNs / dz)~^/'^ . If dNs/dz is small, as at z > 2 currently, meeting 
the tolerances of future dark energy surveys may be problematic. 
However, accuracy requirements at 2 ~ 2 are in general less re- 
strictive than the z ~ 1 tolerance plotted here, as angular diame- 
ter distance and lookback time evolve more slowly with redshift at 
higher 2. 



a((z))= 6.9x10-3 



-1/2 



dNs/dz 



-1/2 



4/i-iMpc 



25,000 
ao/i-iMpc 



2-7 



(16) 

if sample variance is corrected using the observed 
dNs/dz; the prefactor is 6.4 x IQ-^ if sample variance 
is negligible. These errors would be reduced if the red- 
shift range considered is more limited. We plan to further 
explore the effectiveness of nonparametric reconstruction 
of redshift distributions in future work. 

3.3. Impact of redshift outliers 

The objects that fall within a photometric redshift bin 
generally are not a pure population. Non-Gaussian pho- 
tometric errors, e.g. due to contamination by light from 
overlapping objects, may cause some galaxies to incor- 
rectly be placed in a given bin (our sample p), while in 
other cases the observed colors of galaxies at very dif- 
ferent redshifts may be degenerate, causing (f)p{z) to be 
multimodal. 

If (j)p{z) consists of a combination of multiple Gaussians 
that overlap only minimally, the scalings from Equation 
[1^ should hold for each peak, save that we must replace 
Ep, the total surface density of photometric objects, by 
/Ep, where / is the fraction of sample p that is associ- 
ated with a given peak. As the peaks begin to overlap, 
however, this prescription will fail. We have therefore 
adapted our Monte Carlo simulations to test the recov- 
ery of a distribution function consisting of two Gaussian 
peaks of equal width and amplitude (centered at red- 
shifts zi and Z2) as 2:2 approaches zi; i.e., we employ 
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Fig. 6. — Errors in the recovery of the mean redshift of the 
photometric sample, (2) (top), or the Gaussian a of the redshift 
distribution, ctz (bottom), for scenarios where the redshift distri- 
bution of the photometric sample </>p(2)consists of two Gaussian 
peaks of equal height, one centered at redshift 1 and the other at 
redshift 22 . We assume for these simulations that the photometric 
sample has a surface density of 10 galaxies per square arcminute, 
that each peak has Gaussian ct^ = 0.1, and that the spectroscopic 
sample has dNs/dz = 25, 000. Although recovery of Cz is degraded 
when the two peaks are not resolved, measurement of (2) improves 
compared to the intermediate regime. Results are qualitatively 
similar if the true Uz is changed. 



a distribution function (j)p{z)— l/{8TTal) ^/^(exp(— (z — 
zi)^/2(Tf ) -I- exp(— (z — Z2)^/2i7^) ). For convenience, we 
take 2i = 1 and Z2 < zi, though behavior should not 
depend strongly on these choices. The uncertainties in 
measuring (z) and for ct^ =0.1 are shown in Fig. [SJ 
we obtain qualitatively similar results for cr^ — 0.05 or 
0.2. 

Matching the predictions above, when the two peaks 
overlap minimally, the error in the recovered mean, (z) = 
((zi) + (z2))/2, is equal to the sum in quadrature of the er- 
rors in determining each peak's position, divided by two. 
This is equal to the error obtained for a single Gaussian 
peak (since / = 0.5, so the errors in (zi) and (Z2) are -^2 
times larger than the single-peak error, but the error of 
the mean of the two quantities is 1/ as large as the er- 
ror in one) As Z2 approaches 1 , errors reach a maximum 
of ~ 1.5x the minimum value when (z2 — zi) « 2.5(72, 
and then decrease monotonically, approaching the min- 
imum value again when (z2 — zi) << ct^. The behav- 
ior of the uncertainty in recovering cr^ is more complex, 
rising rapidly (by > 5x) when the two peaks are un- 
resolved ((z2 — zi) 1.2(7 z)', fortunately, dark energy 
experiments are generally less affected by errors in az 
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than {z) (|Ma et al.ll200i) . 

Alternatively, we can consider a scenario where red- 
shift outliers share the same mean as the overall sample, 
but have a broader cr^. For this, we consider a distribu- 
tion function 0p(z)= (1 - /o„t;ier)(27rf7j)"i/^ exp(-(z - 

zo)V2a?)-h/o„ti»er(27rcr|)-i/2exp(-(0-0o)V2^T|) ); that 
is, the sum of two Gaussians, with total probability of (1- 
f outlier) ^nd f outlier and Standard deviations Ui and (72, 
respectively, but sharing the same mean, zq. As before, 
we produce Monte Carlo realizations for the recovery of 
such a distribution with cross-correlation techniques, and 
then fit for f outlier, zq, ai, and (T2- For an initial guess for 
each realization, we take random values of zq, cti, and CT2 
with an RMS dispersion of 10% about their true value, 
and a value of f outlier with RMS dispersion 20% of its 
true value. 

For convenience, we simulate distributions zq — 1 and 
f outlier — 0.1, a rcalistic value for faint samples (see, 
e.g., Ilbert et al. 2006). We set cti = 0.05, 0.1, or 0.2, 
and investigate the recovery of (z), cr2, and the net a of 

the distribution, ( (1 - f outlier)'^ erf + JlutUer^'iY^'^ ^ as a 
function of (72 ■ Results for a\ =0.1 are shown in Fig. [7l 

The conclusions are similar for all three values of g\. 
As seen in the top panel of Fig. [71 the presence of a 
small fraction of objects with a greater causes a cor- 
respondingly small (10-20%) degradation in the recovery 
of (z). This is not be a great surprise. Since the two 
Gaussians are required to have the same mean, the prin- 
cipal impact on (z) is due to the broader effective a of 
the distribution; as we found before, reconstruction of 
(z) grows poorer when tr^ is greater. 

When (72 has a value close to that of a\ , differentiating 
the two components becomes difficult; therefore, errors in 
both CTi and 02 are greater for low values of a^. Errors in 
i outlier, too, are correspondingly higher in this regime. 
The result is that the net cr of the distribution can be 
degraded substantially where 02 ~ 2cti, by more than a 
factor of 3. For smaller values of 1T2 than this, analy- 
sis becomes difficult, as the fitting routine will trade off 
which Gaussian component corresponds to which piece 
of the distribution. As increases past the point where 
the two distributions become distinguishable, the error in 
recovering 172 also steadily increases, consistent with the 
increase in errors in as increases for our standard 
scenario. However, foutiier is better determined as CT2 
goes up, such that the RMS error in the net a decreases 
monotonically as a2 increases. 

3.4. Impact of sample variance 

In §3.11 we established the minimum and maximum 
impact sample variance will have on measurements using 
cross-correlation techniques. Here, we attempt to estab- 
lish quantitatively in which regimes a correction using 
the observed dNg/dz, as described in ^2.3\ will mitigate 
the additional errors caused by sample variance, and in 
which regimes cosmic variance is negligible and such a 
correction is inadvisable. 

To do so, we have performed another set of Monte 
Carlo simulations, in which we have added in quadra- 
ture to the result of Equation[9]an additional uncertainty 
corresponding to the fiuctuations in the count of an unbi- 
ased tracer of dark matter in each bin due to sample vari- 
ance when producing each realization. We assume that 
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Fig. 7. — Errors in the recovery of the mean redshift of the 
photometric sample, (2) (top), the Gaussian cr of a redshift dis- 
tribution, cr (middle), or the outlier fraction foutiier (bottom) for 
scenarios where the redshift distribution of the photometric sample 
(?!>p(2)consists of two Gaussian peaks with equal mean, one having 
integral (l-foutUer) Eind rr = 0.1, and the other having integral 
foutiier and a = a2. All curves are plotted as a function of the 
Gaussian cr of the outlier redshift distribution, cr2. We assume for 
these simulations that the photometric sample has a surface den- 
sity of 10 galaxies per square arcminute and that the spectroscopic 
sample has dNs/dz = 25, 000. Rod curves indicate the error in the 
overall mean or net cr of the distribution; blue curves indicate the 
error in recovering the width of the outlier distribution, cr2 or the 
outlier fracton. The dashed line in each case shows what the errors 
would be if foutiier = 0- In the top panel, only one curve is shown, 
as the two Gaussians are required to have equal mean while fitting. 
Results are qualitatively similar if the value of cri is 0.05 or 0.2. 



a total of N fields independent (i.e., widely separated) 
fields are covered by the spectroscopic samples, with each 
field having the same dimensions on the sky. For sim- 
plicity, we consider only two field geometries here: ei- 
ther 1 deg xO.5 deg (correspondingly roughly to the sizes 
of the independent fields surveyed by current deep sur- 
veys such as DEEP2 and the VIMOS-VLT Deep Sur- 
vey [VVDS]), or 2 deg x2 deg (corresponding to proposed 
future surveys). We calculate the uncertainties from 
sampl e variance using the methods of Newman & Davlj 
(j2002[ ). and find that the expected fractional root-mean- 
square (RMS) variations in counts of an unbiased tracer 
are 48% for the smaller field size and 22% for the larger 
over Az=0.01.^ Although the fields are 8x larger in 
area for 2 deg x 2 deg fields, the fluctuations in counts 
due to sample variance are only ~ 2.2 x smaller; the 
power spectrum remains non-negligible on degree scales 



^ IDL code is available 
" j newman/ research . html. 



at ht tp : //astro . berkeley ■ edu/ | 
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at z ~ 1. We use the same parameters as for our stan- 
dard scenario here (Sp — 10 galaxies/square arcminute, 
dN,/dz = 25,000, and a, = 0.1). 

For these calculations, we treat the fluctuations in 
counts from sample variance in successive redshift bins 
as independent; this assumption is fairly good (e.g. for 
ldegx0.5deg field sizes and A2:=0.01, the covariance 
between counts in adjoining redshift bins is roughly 16% 
of the total variance; for Az=0.05, it is only 3%). The 
root-mean-square errors in recovering (z) as a function of 
N fields are shown in Figure [H For 1 deg xO.Sdeg fields, 
the uncertainty when sample variance fluctuations are 
not corrected for is worse than the errors resulting from 
using the observed dNg/dz to make corrections so long 
as N fields ^ 15. For the larger field size, sample variance 
is worse than the uncertainties in the correction only for 
N fields ;S 3. If more fields than this are surveyed, there is 
no advantage to using fiuctuations in the spectroscopic 
redshift distribution to correct for sample variance. A 
number of fields roughly 4 — 5x greater is required for 
sample variance to have completely negligible impact. 

3.5. Covariance of Sample Variance 

Our Monte Carlo simulations assumed that fluctua- 
tions in counts due to sample variance are independent 
between all redshift bins. However, this is of course not 
the case; e.g., high peaks will tend to cluster together. 
We therefore must assess to what degree this covariance 
will worsen the reconstruction of 4'p{z). 

A simple way to test this is to determine how errors in 
the parameters of 4>p{z) change when the redshift bin size 
used is altered. The Monte Carlo simulations described 
above incorporate the total sample variance within what- 
ever bin size is used; e.g., the errors from sample variance 
used if bin sizes double are not simply 1/V2 as large as 
before. One caveat for this test is that recovery of the pa- 
rameters oi (f)p{z) may be affected by discreteness effects 
(the mean of 4'p{z) within a bin is not identical to the 
value of 0p(z) at the bin's center, although it is treated 
as such in fitting) when redshift bins become large, inde- 
pendent of any sample variance effects. 

For our standard scenario (uz ~ 0.1, 4 independent 
fields, Sp = 10), we find that errors in (z) and Uz rise 
steadily as bin size increases if the observed dNg/dz is not 
used to correct for sample variance fluctuations, reaching 
25% larger values for Az — 0.1 bins than for Az — 0.01. 
If dNg/dz is used for corrections, however, errors in re- 
construction are fiat with bin size to 1% or better; this is 
no surprise, as in such scenarios, we are limited by Pois- 
son uncertainties in the correction (which should have no 
covariance between bins) rather than the sample variance 
itself. 

Because of the possibility of discreteness effects, we 
have investigated the effects of the covariance of sample 
variance with a model that may be employed even for 
small bin sizes. We proceed by showing that the domi- 
nant effect is a covariance only between successive red- 
shift bins, with larger-scale effects being comparatively 
negligible; and then show that incorporating this leading- 
order effect causes only a modest degradation in the re- 
covery of (z) and cr^. 

Specifically, let us suppose that the fractional fluctua- 
tion in a count from sample variance in the ith redshift 
bin, which we will label s^, is covariant only with the 



fluctuations in the neighboring bins (si_i and s^+i. We 
also assume that the RMS variations from sample vari- 
ance in each bin are equal - this holds to high accuracy 
to high accuracy for bins of constant Az (based upon 
tests with the QUICKCV code from Newman & Davis 
2002) - and that the covariance between bins similarly 
does not depend on z; we only need these assumptions 
to hold locally (i.e., for small A^). Even if present, small 
asymmetries in the impact of sample variance with red- 
shift would affect this test only modestly, however, as 
they would simply mean that the RMS impact is slightly 
less on one side than another, but the overall effect of 
the covariance would remain largely unchanged. If the 
impact of covariance proved to be large, this might be- 
gin to make a quantitative difference, and it might be 
necessary to include such effects. 

As an additional caveat, because it calculates in real 
space, QUICKCV does not account for the fact that cor- 
relation functions and the power spectrum are asymmet- 
ric in redshift space. Peculiar velocities will cause the 
true covariance between successive bins to be overesti- 
mated by the procedure below, as t heir dominan t effect 
on large scales is the " Kaiser infall" (|Kaiseiill987[ ) , which 
causes large structures to appear collapsed along the line- 
line-of-sight in redshift space. This is particularly the 
case for optically-selected samples at high redshift, which 
are biased towards intrinsically bl ue galaxies and have 
only very weak "Fingers of G-d" (|Coil et all [20071) . or 
if the most nonlinear, sub-Mpc scales are excluded from 
cross-correlation analyses, as suggested in ^'2.2\ The net 
effect of redshift-space distortions is that sample variance 
fluctuations will be more confined to a single redshift bin 
than one would expect from a real-space calculation, so 
our estimated errors from this model will be conservative 
(i.e., biased high). 

Given our assumptions, we may presume that for each 
bin there is an underlying, 'hidden' variable, s'j, which 
has Gaussian random variations completely independent 
of the adjoining bins; and we can write 

s,^{l~2w)s[+ws[_^+ws[^^, (17) 

where w is some unknown weight factor. We take the 
RMS variation for each of the uncorrelated s'^ to be given 
by the variable Gu (for uncorrelated), which by assump- 
tion is the same for all i. 

Given this model and standard propagation of errors, 
it is possible to predict the net fiuctuation due to sample 
variance for a single bin of width Az, 2Az, 3Az, etc. 
The first three of these are: 

al, = (6«;2 -^w + l)al (18) 
ol^z^\{W-Aw + 2)al (19) 

cjl^z=\{^w^-^w + ?.)al. (20) 

Therefore, given numerical predictions for ctaz and a2Az 
given the full power spectrum (predictions which we take 
from QUICKCV), it is possible to solve for w and (t„ in 
this model. We can then use the same code to determine 
C3Az, and compare it to the prediction of the model, in 
order to assess the model's effectiveness at incorporating 
the impact of the covariance of sample variance. 

We find that this simple model is highly effective. For 
bin sizes (i.e. Az) ranging from 0.003 to 0.1, ignoring 
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the covariance of sample variance completely (so that 
csAz = ctaz/n/S) underpredicts the RMS fractional vari- 
ation in a count in a bin of width 3Az by anywhere from 
2% (for Az = 0.1) to 23% (for Az = 0.003); for our stan- 
dard Az = 0.01 bins, the underprediction is 15% (all 
tests are for a 1 deg x 1 deg field with central redshift 
z = 1). If we employ the model described above, how- 
ever, the underprediction ranges from below 0.1% (for 
Az > 0.025) to 3% (for Az = 0.003); for Az = 0.01 
the prediction is off by 0.6%. Clearly, the vast major- 
ity of the effect of the covariance of sample variance is 
described simply by a covariance between successive red- 
shift bins. This holds true even if we had tested over a 
larger z range; e.g., for Az — 0.01, the model correctly 
predicts the net variance over 5A2; to within 1.5%, or over 
IOA2; to within 2.1%. However, in the latter case, the as- 
sumption that all the bins are statistically independent 
performs even better, matching to 0.4%; so this model 
will actually overpredict the impact of the covariance of 
sample variance on large scales. 

The form of equation [17] is particularly convenient for 
incorporation into our Monte Carlo tests. Instead of 
simply randomly drawing the fractional fluctuation from 
sample variance in a given bin, Si, as before, instead we 
can draw the uncorrelated s[, and then construct the 
correlated Si from the uncorrelated s[ in making each 
Monte Carlo realization; we need only predict w and au 
(which we do using QUICKCV). The impact of adding 
this covariance to our models may be seen in Fig. [5J er- 
rors are increased by roughly 30% in the worst case (for 
N fields = 1), but by < 15% for N fields > 10, the regime 
in which correcting for sample variance fluctuations with 
the observed dNg/dz becomes ineffective. For our scal- 
ing scenario {N fields = 4), errors are degraded by 27%, 
slightly worse than the difference between Az = 0.01 
and 0.10. We expect that any corrections from the much 
smaller, larger-range covariance would be considerably 
less than this; hence, we conclude that the covariance of 
sample variance has relatively minor impact on our re- 
sults. The prefactor of 9.1x10^^ in[T2]becomes 1.2x10"'^ 
when this covariance is accounted for, still well within 
SNAP and LSST requirements. 

The impact is even smaller for our standard scenario, 
for which the observed dNg/dz is used to correct for sam- 
ple variance in each redshift bin. In that case, the Pois- 
son uncertainty in this correction is far greater than the 
covariance from sample variance, and the latter becomes 
totally negligible; hence, all the major results of this pa- 
per are unaffected by this covariance. 

4. POSSIBLE SYSTEMATICS 

We now consider the impact of a variety of effects that 
violate the simple assumptions underlying our basic sce- 
nario. We will treat these errors analytically wherever 
possible. We summarize the results of this section in 
Table El 

4.1. Evolution in bias 

Inspection of Equation 2] shows that to transform the 
cross-correlation signal Wsp into (f>p{z), we require knowl- 
edge of the parameters of ^sp, namely 7 and rQ^^p, as a 
function of redshift. As described in ^2.1[ however, Wpp 
provides constraints only on the mean clustering of the 
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Fig. 8. — Errors in the recovery of (z) versus the number of 
independent fields surveyed for two different field geometries (red 
and blue curves), resulting from Monte Carlo tests in which sample 
variance errors are added to our standard scenario (Sp = 10 galax- 
ies per square arcminute, Cz = 0.1, dNs/dz = 25,000). As the 
number of fields increases or if larger fields are used, errors from 
sample variance decrease. However, an upper limit on the prac- 
tical impact of sample variance is set by the black, dashed curve, 
which indicates the errors if the observed dNs / dz distribution is 
used to correct for the density fluctuations in each redshift bin. 
The errors after this correction are set by Poisson statistics from 
the spectroscopic sample, rather than by clustering. For current 
redshift samples, applying such a correction (as assumed in the pre- 
ceding plots) is favored; however, WiggleZ and subsequent surveys 
should cover ~ 1000 square degrees each, so sample variance will 
affect them only minimally. The red, dot-dashed curve shows what 
the errors would be if sample variance were not covariant between 
redshift bins; see i|3.5l 

photometric sample. What is the impact on the derived 
0p if these parameters evolve? 

We test this by assuming that the net change in 
with redshift is due only to changes in the linear bias of 
of the photometric sample, bp. We further assume that 
the evolution of the bias is linear; i.e., bp(z) = 6p(l) -I- 
{db/dz){z — 1) with constant db/dz, and that db/dz is 
small compared to 6p(l). As usual, we take 4>p{z) to 
be given by Equation llll and adopt the simple-biasing 
assumption = {^ssS.ppY^'^ , so cx (6p(z)/6p(l))^/2. 
Then, if bp varies but Wsp is interpreted with a constant 
bp, to leading order in db/dz the measured (z) will be off 
by: 



A((z)) 



/^(fcSl) 9{z)dz Jzg{z)dz 



db/dz 
2 6p(l) 



g{z)dz 

db/dz 
''2bp{l) 



J g{z)dz 



(z-l) 



dz 



(21) 



applying the linear approximation (1 -I- e)^/^ « (1 -|- e/2). 

By comparing the observed galaxy clustering to the 
predicted clusterin g of da rk matter in a as = 0-9 niodel 
from ISmith et al.) (|2003l ). ICoil et all (|2006cD estimate 
that the linear bias of ~ galaxies at z 1 is 
1.48 ±0.04. Applying the sa me method s at z = 0, the cor- 
relation measurements of Zehavi et al.l ()2005f ) correspond 
to a linear bias for galaxies in the Sloan Digital Sky 
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Survey of 1.0, differing by ;S0.5±0.1. We therefore take 
0.5 to be a reasonable upper limit on [{dh/ dz) /bp{i)\. If 
the bias of the photometric sample evolves but we do not 
take that into account in our analysis, we would make a 
systematic error of at most 0.0025 for — 0.1, pushing 
the tolerances of future dark energy projects. In actual- 
ity, however, we do possess information on bias evolution 
(e.g. by measuring the mean hp over successive photo- 
metric bins), so it is very unlikely this effect would lead 
to an error that large. 

It is worth noting that spectroscopic surveys with mul- 
tiple selection techniques (e.g. by a variety of different 
color cuts) may select objects with very different bias ac- 
cording to the technique used. This could, for instance, 
lead to jumps in bias at the redshifts where a given sub- 
sample becomes relevant or irrelevant. In that case, it 
would be most effective to split the survey into its con- 
stituent subsamples, treating the clustering/bias of each 
subsample separately. 

4.2. Errors in measuring autocorrelations of 
spectroscopic samples 

A second potential issue is systematic errors in mea- 
suring the autocorrelation function of the spectroscopic 
sample, ^ss- Although random errors in modern surveys 
are small, it is very difficult to reduce systematic errors 
below 1-2%. These systematic errors will generally cause 
^ss to be over- or under- estimated similarly at all z cov- 
ered by a given survey. To assess the impact of these 
systematics, we test their worst-case impact using our 
standard ippi^) distribution. 

Thus, we assume that there are only two redshift sur- 
veys; that one survey covers the complete z < 1 regime, 
and another z > 1; and that the ^ss measurement for 
each survey may suffer an unknown systematic of RMS 
fractional amplitude agys, i.e., if a^ys = 0.02, we expect 
the measured amplitude of ^ss (for similar galaxies at 
the same redshift) to vary by 2% from survey to survey. 
Thus, one half of the reconstructed (t>p{z) will have an 
amplitude differing from the other by a factor r^ys drawn 
from a distribution with RMS asys and mean 1 (the dif- 
ference is cTgys rather than 2asys due to the simple-biasing 
assumption and linear approximation to the square root, 
as in SI]). Then: 



A((z)) 



I-oo^9iz) dz + rsys zg{z) dz 



I-oo 9{z) dz + rsys f^' g{z) dz 
,s - 1 V2a, 



' sys 



1 

1 V2a, 



(22) 



taking a sys to be small, so r^ys ~ 1. Propagating er- 
rors, we then find that in a worst-case scenario, (z) will 
have a root-mean-square bias of (l/27r)^/^(TsysCz, or more 
conveniently, 8.0 x 10~^(crsj^s/0.02)((Tz/0.1), well within 
estimated tolerances for SNAP and LSST. 

4.3. Field-to-field zero point variations 

A third factor not considered in our standard sce- 
nario is spatial variation in the effective zero points of 



the photometry used to define the photometric sample, 
p (due to seeing, calibration issues, etc.). Note that 
random photometric errors or absolute zero point un- 
certainties have no effect on redshift distributions mea- 
sured from cross-correlations; the method will empiri- 
cally determine (t>p{z) for whatever falls in a given pho- 
tometric redshift bin, regardless of whether a given ob- 
ject is put in that bin due to errors or because it right- 
fully belongs there. Instead, the principal impact of 
zero point errors will be changes in the effective depth 
of the sample between separately calibrated patches; an 
error of Am magnitudes will lead to a fractional error in 
number counts of N~^dN/dMAm. For i?-band-limited 
samples, d(\oK,nN )/dM = N'^ /{lnlO)dN/dM « 0.36 
(|Coil et al.ll2004bl ) , so the fractional variation in the sur- 
face density of objects in the photometric sample (Sp), 
will be approximately 0.83 dzp, where azp is the RMS 
variation of the photometric zero point in magnitudes. 
The logarithmic slope of galaxy number counts is larger 
in B 0.5) and slightly smaller for / (~ 0.33), leading 
to modestly different prefactors for these cases. 

Zero point variations will impact the errors in deter- 
mining (t>p{z) in two ways. The first is that, if different 
spectroscopic surveys cover regions with different pho- 
tometric zero points, the cross-correlation signal will be 
artificially boosted or decreased for each survey as <Tzp 
varies, since the overall value of Sp used for normaliza- 
tion will be not quite appropriate for the effective mag- 
nitude limit in each patch of sky. Again, we consider a 
worst-case scenario, where one survey covering one set of 
Npatch independent calibration patches is used to recon- 
struct 4>p{z) at z < 1, and another covering a separate 
set of Npatch patches is used for z > 1. 

We may again apply the results of Equation [22l Since 
the fractional error in Wsp for a single survey will be 

— 1/2 

0.83 (T^p A^pjjj^^ (as we are averaging over Npatch calibra- 
tion patches), the RMS variation in rsys (the ratio of 
the reconstructed (/)p(z) at z > 1 to z < 1) should be 

V2 X 0-&icrzpN~^/2- Thus, the worst-case RMS error 
in a measurement of (z) due to zero point variations 
is 2.3 X 10-3 (a,p/0.01)(AfpaW4)-'/'('Tz/0.1), within 
SNAP and LSST tolerances. Specifications for zero point 
variations are generally smaller than this (e.g. 0.005 
mag RMS zero point variation for LSST; cf. iBurke et al.l 
12006) ■ and that ongoing redshift surveys should cover 
many independently-calibrated patches of sky, so zero 
point variations are likely to have even smaller impact. 

The second effect of zero point variations will be to in- 
crease the fluctuations in counts beyond those expected 
for Poisson errors (as assumed in Equation [§]) , even if 
there is only one redshift survey. However, for reason- 
able scenarios, this is minor; even in a very conservative 
scenario, with dNs/dz = 25,000, Sp — 10 galaxies per 
square arcmin, only 3 independently calibrated patches 
of sky surveyed, and RMS zero point errors of 0.05 mag, 
errors in mean z and <Tz from Monte Carlo tests increase 
by a fraction of a percent of their value when this effect 
is added to our standard scenario. 

4.4. Errors in assumed cosmology 

As seen in Equation \5\ transforming w{z) to 4>p{z) 
requires knowledge of the volume element, dV/dzdfl— 
dAiz)^ dl/dz. Because the scale radii used are expressed 
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in h^^ Mpc, all scalings with the Hubble parameter can- 
cel out; but dV/dz dfi will depend on other cosmological 
parameters as well. Similarly to the preceding sections, 
we investigate this by assessing the error in (z) that will 
result from assuming a mistaken cosmology. For con- 
venience, we consider only spatially flat, quintessence 4- 
cold dark matter models characterized by a matter den- 
sity flm and dark energy equation-of-state parameter w; 
these are 0.3 and -1, respectively, for our standard sce- 
nario. 

We then determine the error in (z) that occurs if the 
cosmology differs from the standard in one of these 
parameters, but w{z) is interpreted using the standard 
cosmology. Let Vassum,ed{z) and Vtrue{z) denote the 
values of dA{z)'^dl/dz for the assumed cosmological 
model and the true cosmology, respectively. Then 
mistaken assumptions will lead to an error in (z) given 
by: 

^^^^ _ /o°° Z Vassumediz) g(z) dz Z Vtrueiz) g{z) dz 

/o°° Vassuyyied [z) g{z) dz Vtrue (z) g{z) dz 

(23) 

For (Tz ^ 0.3, the effect of varying fl^ rnay 
then be approximated well by A(z) = 4.2 x 
10-''((T^/0.1)2(Ar2,„/0.03), where we normalize to a 10% 
variation in p,m ( comparable to errors from WMAP; 
iSpers-el et al1[200l . The effects of varying w are much 
less symmetric about our standard scenario, so we con- 
sider w < —1 and w > —1 separately. The impact of 
errors in cosmology are stronger in the former case: ap- 
proximately A(z) = 7x 10-5(ctJ0.1)1-9(Au;/0.1), where 
Aw = w — 1. For w > —1, A(z) has a turning point 
at w « —0.93; the value of A(z) at that turning point 
is approximately 1.7 x 10~^((Tz/0.1)^'^. Thus, it is more 
important to constrain f2m than w when using cross- 
correlation techniques to determine (/)p(z), but regard- 
less, the cosmological uncertainties are small compared 
to the requirements of proposed dark energy surveys. 

5. CONCLUSIONS 

In this paper, we have described a new method 
for recovering the redshift distribution of objects in a 
photometric sample by measuring their angular cross- 
correlations with objects in redshift survey samples as a 
function of spectroscopic z. This technique does not re- 
quire that spectroscopic samples resemble the photomet- 
ric sample in galaxy properties (such as luminosity) or 
clustering amplitude. We have demonstrated that in re- 
alistic scenarios, the redshift distributions of photometric 
samples may be determined to the precision required by 
proposed dark energy experiments with this technique. 
We conclude here by discussing what can be done to 
optimize future redshift survey datasets to facilitate ap- 
plications of cross-correlation techniques. 

• First, it is apparent from Figure [1] that there are two 
redshift regimes that are currently much more poorly 
sampled than others: 0.2 ^ z ^ 0.7, and z > 1.4. Con- 
centrating future survey efforts on these regimes would 
be of great benefit for application of cross-correlation 
methods. Efforts to cover this lower-redshift gap are al- 
ready underway. 

• We emphasize that, although high redshift precision 
is not requisite for the spectroscopic sample - e.g., ideal 



photometric redshifts with ct^ = 0.01 would be useable 
for determining the true redshift distribution for a sam- 
ple with (Tz ^ 0.1 - it is vitally important that the purity 
of the redshifts be very high. Otherwise, redshift outliers 
in the spectroscopic sample will cause distortions in the 
recovered 4>p{z). Of course, the same holds true for any 
photometric redshift calibration technique; if false red- 
shifts are used to calibrate a redshift distribution, it is 
quite likely to be biased in some way. 

In most surveys, high-purity redshifts are only ob- 
tained for a fraction of the sample. For instance, in the 
DEEP2 Galaxy Redshift Survey, roughly 80% of success- 
ful redshifts (i.e., ~ 55% of targeted galaxies) fall within 
the highest purity class, which has a ^ 0.5% error rate 
(based on tests with the > 2000 objects observed mul- 
tiple times), while the remainder have an error rate of 
roughly 5% (Faber et al. 2006, in prep.). As another 
example, in the VIMOS-VLT Deep Survey, - 20% of 
galaxies targeted yield a redshift in their highest (99%) 
redshift confidence category; alr nost all of t hose high- 
confidence objects have z < 1 (|Ilbert et al.l [20051. In 
contrast, in the Sloan Digital Sky Survey, which samples 
local galaxies with much higher signal-to-noise spectra, 
almost all galaxies yield an accurate redshift (Schlegel et 
al., in prep.). 

In general, the higher signal-to-noise spectra within a 
sample will generally yield a greater rate of secure red- 
shifts; i.e., the brightest galaxies will dominate samples 
of high-confidence redshifts. This can be a problem for 
direct calibration of photometric redshifts, but has lit- 
tle effect on cross-correlations; the spectroscopic sample 
need not include high-confidence redshifts of faint galax- 
ies, so long as a sufficient number of brighter galaxies 
at the same redshift are included. This allows larger, 
shallower surveys to be used to calibrate redshift distri- 
butions even of very faint photometric redshift samples. 

• Cross-correlations can be analyzed more simply if 
nonlinearities have minimal impact; e.g., because the 
spectroscopic sample has relatively weak biasing and 
scale-dependence in its bias. This suggests that blue, 
star- forming galaxies may be more useful for this tech- 
nique than red, early-type galaxies (jZehavi et al.ll2005t 
ICoil et al.l[2004at Coil et al. 2007, in prep.). However, 
red sequence galaxies can yield relatively high-precision 
photometric redshifts; they therefore may make up for 
this disadvantage by sheer abundance. Quasars might 
make an attractive population to use at high redshift 
given their high luminosity, but the difficul ty of mea- 
suring their autocorrelations with precision (fCdT et al.l 
l2006bl ) and the possibility that quasars may affect the 
evolution of nearby galaxies may make their use to mea- 
sure cross-correlations over any but the largest scales dif- 
ficult. 

Above all, it is important that future photometric dark 
energy experiments overlap with spectroscopic surveys 
on the sky. Without this, cross-correlation measure- 
ments are impossible. These measurements are not only 
useful for determining redshift distributions, but also 
will allow correlation functions to be measured down to 
much lower luminosities than can be reached spectro- 
scopically. Furthermore, the availability of more pho- 
tometry in fields with spectroscopy can improve our un- 
derstanding of the spectroscopic samples by broadening 
spectral energy distribution measurements; having pho- 
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TABLE 2 

Summary of Random and Systematic Errors 



Error type 



Corresponding uncertainty in (z) 



Random errors 

Random errors, sample variance negligible 

Not accounting for evolution in bias 

Systematic errors in autocorrelation measurements 

Ficld-to-ficld zero point variations 

Errors in assumed 

Errors in assumed w 



1.5 X 10-3 {az/O.iy (Sp/10)-°-3 ((<iArs/dz)/25,000)-V2 a 
1.0 X 10-3 ((Tz/0.1)i-5 (Sp/10)-i/2 ({dAfs/d2)/25,000)-V2 
2.5 X 10--^[(i6/d^/6p(l)]/0. 5 ((7^/0.1)2 

< 8.0 X 10-'' ((Jsys/0.02) (cr^/0.1) 

< 2.3 X 10-4 (<7,j,/0.01) (TV tcfe/4)-i/2(CT,/0.1) 
4.2 X 10-''(cr3/0.1)2 (An,„/0.03) 

< 7 X 10-" (o-^/0.1)i-3 (Aw)/0.1) 



^ Throughout this tabic wc give the surface density of the photometric sample, Sp, in galaxies per square arcminute. 



tometry in five bands for each galaxy in the SDSS spec- 
troscopic sample has been a great boon to studies of 
galaxy properties. The synergies between photometry 
and spectroscopy are great, and determination of red- 
shift distributions from cross-correlations is only one of 
many applications, though a vital one. 
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